System and method for storing and retrieving filenames and files in computer memory

ABSTRACT

The invention receives a request to store a file having a filename written in a first text encoding, converts the filename into a Unicode filename and stores the Unicode filename and the file into memory. The invention then sets a flag, associated with the memory, indicating that a first text encoding has been used. To retrieve a Unicode filename, the invention receives a request to locate a Unicode filename from memory. Next, the invention uses a predetermined text encoding to convert the filename into Unicode. The invention then searches for the Unicode filename in the memory. If the Unicode filename is not found, the invention uses a next text encoding from the set of text encodings which have been used, to repeat the conversion and searches the memory until the Unicode filename is identified. Lastly, the Unicode file is retrieved.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] This invention relates generally to computer operating systemsand more particularly to storing and retrieving filenames in computermemory.

[0003] 2. Description of the Background Art

[0004] The storing and retrieving of filenames in computer memory isextremely important to all computer users. When a computer user saves afile and filename into computer memory, it is important that thefilename remain uniquely identifiable regardless of any other filenamesor text encodings saved in the memory. If a filename is not uniquelyidentifiable, then a computer may be unable to retrieve the named file.Further, if the memory containing the filename is moved to a differentcomputer then that filename must remain identifiable if the named fileis to be retrievable.

[0005] Conventionally, a filename identity is represented by a string ofbytes (“encoding”) stored in computer memory. A conventional Romancharacter based computer system will interpret the encoding to representRoman characters in the American Standard Code for InformationInterchange (ASCII) character set, even if the encoding actuallyrepresents Japanese characters. For example, a Japanese computer usermay save a file with a Japanese filename onto a removable memory device,such as a floppy disk. The Japanese filename encoding is interpreted bya conventional Japanese character based computer system to be Japanesecharacters. However, if the Japanese user then inserts the removablememory device into a conventional Roman character based computer system,the Roman computer system will assume the Japanese encoding actuallyrepresents a Roman character filename rather than a Japanese characterfilename.

[0006] A problem with the conventional Roman character based computersystem is that because it assumes that a filename is in Romancharacters, it may equate two non-Roman character filenames as beingidentical. This is because a Roman computer system treats uppercase andlowercase letters in a filename as equivalent. Therefore, a Romancomputer system would assume that the filenames “Example.txt” and“example.txt” (and their associated files) are the same even though theyare represented by different strings of bytes, possibly leading to theassumption that two non-Roman filenames, which vary only by case, areidentical. If a Roman computer system misinterprets a non-Romanfilename, the system may mistakenly open the wrong file or may refuse tocreate a new file since it believes that that filename is already inuse.

[0007]FIG. 1 is a diagram of Japanese characters in which characterswithin any given column appear identical to a conventional Romancharacter based computer system. For example, characters 104, 106, 108and 110 in column 102 appear identical to a prior art Roman computersystem because it treats all filenames as if they were written in theRoman alphabet. Therefore, if two Japanese filenames differed by justone character, such as characters 104 and 106, a prior art Romancomputer system would actually consider them to be identical. Similarproblems occur with other text encodings but the problem is most acutein Japanese and Chinese text encodings since in these languages eachcharacter is a word and therefore filenames are shorter and more likelyto vary by just one character.

[0008] A Roman character based prior art system can only store filenamesin Roman text encodings as partially represented by ASCII text encodingtable 200 of FIG. 2. Each Roman character has its own encoding. Forinstance, character 202, the letter “A”, is stored as 7-bit encoding204. However, because ASCII only allows 7 bit encodings, which meansthat ASCII can encode only 128 characters, basic ASCII encoding table200 contains no encodings for Japanese or any other language that usesnon-Roman characters. Japanese and other east-asian languages can easilyhave several thousand characters that need to be encoded. Therefore, aprior art Roman character based computer system cannot always accuratelystore or retrieve some east-asian filenames or other non-Romanfilenames.

[0009] Therefore, an improved system and method are needed to store andretrieve filenames and files in a computer system.

SUMMARY OF THE INVENTION

[0010] The present invention provides a system and method for accuratelystoring and retrieving filenames in computer memory by convertingfilenames into Unicode text encoding. The Unicode Standard, like theASCII text encoding standard and others, encodes each character as anumerical value. However, instead of encoding simply in ASCII, Unicodetext encoding encodes all the characters used in the world's majorwritten languages, including Greek, Arabic, Tamil, Thai, Japanese,Korean and many others.

[0011] The invention stores a filename into computer memory by firstdetermining a default text encoding based upon which it converts thefilename into Unicode text encoding. If the conversion is successful,the invention stores the Unicode text-encoded filename into computermemory and sets a bit that corresponds to the default text encoding inan Encoding Bitmap located in computer memory.

[0012] If the conversion based on the default text encoding isunsuccessful, the invention tries using Roman text encoding to convertthe filename into Unicode text encoding. Once the conversion iscomplete, the invention stores the filename into computer memory andsets the bit that corresponds to Roman text encoding in the encodingbitmap. The invention assumes that any sequence of bytes can beconverted to Unicode using Roman text encoding, which assigns a meaningto every possible byte sequence. If conversion using the defaultencoding falls, conversion using Roman text encoding will definitelysucceed, even if it produces the wrong Unicode characters.

[0013] To retrieve a filename, the invention first converts theretrieval request into Unicode text encoding based on the default textencoding of the system. The invention then searches the computer memoryfor a matching Unicode text encoded filename. If the search issuccessful, the search result is returned. If the search is notsuccessful, the invention determines if Roman text encoding is thedefault text encoding. If Roman text encoding is not the default textencoding, the invention uses Roman text encoding to convert theretrieval request into Unicode text encoding and then searches thecomputer memory for a matching Unicode filename. If the search issuccessful, a search result is returned.

[0014] If the search is not successful, or if Roman text encoding is thedefault text encoding, the invention next retrieves a list of all textencodings previously used in the system as specified in an EncodingBitmap located in the computer memory of the system. The invention thenconverts the retrieval request into Unicode text encoding based on eachtext encoding specified in the encoding bitmap and uses each conversionto search the computer memory for a match. If a match is found, theinvention returns the search result.

[0015] Finally, if the search is still not successful the inventionconverts the retrieval request into Unicode text encoding based on anyother text encodings installed in the computer memory that have yet tobe tried. The invention then uses each conversion in searching thecomputer memory for a matching Unicode filename. If the search issuccessful, the invention returns the search result. If the search Isnot successful, the invention returns an error message.

[0016] Accordingly, the present invention not only more accurately andefficiently stores and retrieves filenames in computer memory but alsoallows multiple encodings to be used in computer memory over time.

BRIEF DESCRIPTION OF THE DRAWINGS

[0017]FIG. 1 is a diagram of Japanese characters in columns that appearidentical when storing or retrieving a filename using a prior artsystem;

[0018]FIG. 2 is a diagram of ASCII text encodings used by a prior artsystem;

[0019]FIG. 3 is a block diagram of a computer system suitable for usewith the present invention;

[0020]FIG. 4 is a block diagram of the preferred allocation of thememory shown in FIG. 3;

[0021]FIG. 5 is a block diagram of the preferred embodiment of theUnicode Table in the memory shown in FIG. 4;

[0022]FIG. 6 is a block diagram of the preferred embodiment of theEncoding Bitmap in the memory shown in FIG. 4;

[0023]FIG. 7 is a flowchart of preferred method steps for storing afilename into computer memory according to the present invention; and

[0024]FIGS. 8a and 8 b are a flowchart of preferred method steps forretrieving a filename from computer memory according to the presentinvention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

[0025] The present invention relates to an improvement in storingfilenames in, and retrieving them from, computer memory.

[0026]FIG. 3 is a block diagram of a computer system suitable for usewith the invention. Computer system 300 preferably includes a CentralProcessing Unit (CPU) 304, a monitor 306, a keyboard 308, memory 310,and an input and output (I/O) interface 312, all connected by a systembus 302. Memory 310 may comprise a hard disk drive, random access memory(RAM) or any other appropriate memory configuration.

[0027]FIG. 4 is a block diagram of the preferred allocation of memory310, which stores a Unicode table 402 that contains 16 bit encodings formost modern written languages as discussed further in conjunction withFIG. 5. Memory 310 also stores a File Manager 404 which manages document406 and other documents with their respective filenames that are storedin memory 310, as discussed further in conjunction with FIG. 7, FIG. 8aand FIG. 8b. Memory 310 also stores text encodings 408 for variouslanguages such as Roman, Greek and Japanese, and an encoding bitmap 410which lists all previously used text encodings, as discussed further inconjunction with FIG. 6.

[0028]FIG. 5 is a diagram of the preferred embodiment of the UnicodeTable 402, which contains bit encodings for most of the world's modernwritten languages. Unicode, published as The Unicode Standard, WorldwideCharacter Encoding, is now the standard for representing text. Unicodeuses a 16-bit coding scheme that allows for 65,536 distinctcharacters—more than enough to include all languages in use today.Currently, Unicode text encoding covers 38,887 different characters. Forexample, the Roman character “A” 502 is represented by bit encoding 504.The Greek character “α” 506 is represented by bit encoding 508. TheChinese character for sky (“tian” in Mandarin Chinese and “tin” inCantonese) 510 is represented by bit encoding 512. Most modern writtenlanguages can be encoded using Unicode text encoding. However, somerelatively obscure languages in current use, such as Cherokee andMongolian, cannot be encoded using Unicode text encoding. Accordingly,almost any filename can be accurately represented in its native languageusing Unicode text encoding instead of having to be converted, possiblyinaccurately, to Roman characters.

[0029]FIG. 6 is a diagram of the preferred embodiment of the FIG. 4Encoding Bitmap 410, which contains a list of all text encodingspreviously used in system 300. Whenever a given text encoding is used insystem 300, file manager 404 sets a relevant field in encoding bitmap410. For instance, if field 602 represents Hebrew and Hebrew has notbeen used in system 300, field 602 contains a 0. If field 604 representsArabic and Arabic has been used in system 300, field 604 contains a 1.

[0030]FIG. 7 is a flowchart of steps in a preferred method 700 for filemanager 404 to store a filename into computer memory 310 according tothe invention. In step 703, file manager 404 receives a “save” request,which contains filename information for document 406. Alternatively, the“save” request can be a request to change a filename. In step 704, filemanager 404 creates a file and/or saves document 406 in memory 310. Ifthe save request in step 703 was a change filename request, step 704 canbe skipped. The contents of the document 406 can also be saved in memory310 after completion of the method 700.

[0031] In step 706, file manager 404 determines a default text encodingof system 300, which in this case is a text encoding used to viewfilenames on monitor 306. In step 708, file manager 404 uses the defaulttext encoding determined in step 706 to convert the filename to aUnicode name.

[0032] Step 710 determines whether the step 708 conversion using thedefault text encoding was successful. If the step 708 conversion was notsuccessful, then in step 712 file manager 404 uses Roman text encodingto convert the user-entered filename to Unicode text encoding. Note thatstep 712 cannot fail. Even if the filename was not actually written inRoman characters, method 700 will still convert the user-enteredfilename to Unicode using Roman encoding. This is because all possiblebyte sequences yield valid Roman characters that can be converted intoUnicode. The filename will not be in the intended characters, but thefilename will be individually distinguishable.

[0033] Once the step 712 conversion is complete, or if the step 708conversion was successful, then in step 714 file manager 404 saves theUnicode name to memory 310. In step 716, file manager 404 sets a bit inencoding bitmap 410 that corresponds to the type of text encoding usedto convert the user-entered filename. In step 718 method 700 ends.

[0034]FIGS. 8a and 8 b are a flowchart of steps in a preferred method800 for file manager 404 to retrieve a filename from computer memoryaccording to the invention. In step 804 file manager 404 receives asearch request which was generated when a system 300 user attempted toopen document 406, or any other document, stored in memory 310. Thesearch request contains a user-entered filename. In step 805 filemanager 404 converts the user-entered filename to Unicode text encodingbased on the default text encoding of system 300. As discussed inconjunction with FIG. 7, the default text encoding in this example isthe text encoding used to view filenames on monitor 306. If the step 805conversion was not successful, then file manager 404 proceeds to step816 as discussed below. If the conversion was successful, then in step807 file manager 404 searches memory 310 for the converted filename. Iffile manager 404 locates a matching filename, file manager 404 returnsthe search result and retrieves the file having the matching filename instep 812 and method 800 ends in step 814.

[0035] If the step 807 search did not locate a matching filename, or ifthe step 805 conversion was not successful, then in step 816 filemanager 404 determines if Roman text encoding is the default textencoding of system 300. If Roman text encoding is not the default textencoding, then in step 817 file manager 404 converts the user-enteredfilename to Unicode text encoding using Roman text encoding. In step819, file manager 404 searches memory 310 for the converted filename. Ifit finds a matching filename, then in step 822 file manager 404 returnsa search result and retrieves the file having the matching filename, andmethod 800 ends in step 824.

[0036] If the step 819 search did not locate a matching filename, or ifin step 816 file manager 404 determined that Roman text encoding is thedefault text encoding of system 300, then in step 826 file manager 404retrieves a list of text encodings from encoding bitmap 410.

[0037] Next, in step 827, file manager 404 converts the user-enteredfilename into Unicode text encoding using a text encoding from the listretrieved in step 826 from encoding bitmap 410. File manager 404converts the filename into Unicode using only text encodings not alreadyused in steps 805 and 817. However, in practice system 300 will probablyonly have installed one or two text encodings—usually Roman and a localtext encoding such as Japanese. The local text encoding is normally setas the default text encoding that is tried in step 805. Therefore,method 800 generally is successful at either step 808 or step 820 anddoes not reach step 826.

[0038] If the step 827 conversion is not successful, then File Manager404 proceeds to step 834. If the step 827 conversion is successful, thenin step 829 file manager 404 uses the converted user-entered filename tosearch memory 310 for a matching Unicode filename. If in step 830 thesearch is successful, then in step 832 file manager 404 returns a searchresult and retrieves the file having the matching filename, and in step833 method 800 ends. If in step 830 the search was unsuccessful, or ifthe step 827 conversion was unsuccessful, then in step 834 file manager404 determines if there are other text encodings listed in encodingbitmap 410 that have not been tried. If there are some text encodingsthat have not yet been tried, then file manager 404 returns to step 827.

[0039] If in step 834 all text encodings listed in encoding bitmap 410have been tried, then file manager 404 proceeds to step 835 and tries toconvert the user-entered filename into Unicode text encoding based onany other text encodings installed in system 300. As in step 827, Milemanager 404 tries conversions to Unicode text encoding using onlypreviously untried text encodings. If the step 835 conversion isunsuccessful, then File Manager 404 proceeds to step 844. Otherwise, instep 837, file manager 404 searches memory 310 for a matching Unicodefilename. If the search is successful, then in step 840 file manager 404returns a search result and retrieves the file having the matchingfilename, and in step 842 method 800 ends. If the search isunsuccessful, but in step 844 not all text encodings have been tried,then file manager 404 returns to step 835 and tries to convert theuser-entered filename to Unicode text encoding using another textencoding. If in step 844 all the text encodings installed in system 300have been tried, then in step 846 file manager 404 returns an errorresult and in step 848 the method 800 halts.

[0040] The invention has been explained with reference to a preferredembodiment. Other embodiments will be apparent to those skilled in theart in light of this disclosure. For example, the invention may readilybe implemented using configurations other than those described in thepreferred embodiment. Additionally, the invention may effectively beused in conjunction with systems other than the one described as thepreferred embodiment. Therefore, these and other variations upon thepreferred embodiments are intended to be covered by the appended claims.

What is claimed is:
 1. A method for storing filenames in and retrievingfilenames from a memory device, comprising the steps of: determining adefault text encoding based on a text encoding used to display text on amonitor coupled to said memory device; converting a filename from saiddefault text encoding to Unicode text encoding; storing the convertedfilename in said memory device; receiving a retrieval request;converting said retrieval request from said default text encoding toUnicode text encoding; searching said memory device for a filename whichmatches the converted retrieval request; retrieving the file having thematching filename from said memory device; and modifying an encodingbitmap to indicate the use of said default text encoding.
 2. The methodof claim 1 wherein said step of converting a filename is unsuccessful,and including the further step of next converting said filename toUnicode text encoding based on Roman text encoding.
 3. The method ofclaim 2 wherein said step of converting said retrieval request isunsuccessful and including the further steps of: determining if Romantext encoding is a default text encoding in said memory device; and ifRoman text encoding is not a default text encoding, then using Romantext encoding to convert said retrieval request to Unicode textencoding.
 4. The method of claim 3 wherein said step of searching isunsuccessful and including the further steps of: retrieving a list ofpreviously-used text encodings from said encoding bitmap; using textencodings from said list to convert said retrieval request to Unicodetext encoding; and repeating said step of searching.
 5. The method ofclaim 4 wherein said step of searching is unsuccessful and including thefurther steps of: converting said retrieval request to Unicode textencoding based on, any text encodings installed in said memory device;and repeating said step of searching.
 6. The method of claim 4 whereinsaid step of converting said retrieval request is unsuccessful andincluding the further step of converting said retrieval request toUnicode text encoding based on any text encodings installed in saidmemory device.
 7. The method of claim 2 wherein said step of searchingis unsuccessful and including the further steps of: determining if Romantext encoding is a default text encoding in said memory device; if Romantext encoding is not a default text encoding, then using Roman textencoding to convert said retrieval request to Unicode text encoding; andrepeating said step of searching.
 8. The method of claim 1 wherein saidsearching step is unsuccessful and including the further steps of:converting said filename to Unicode text encoding based on Roman textencoding; and repeating said step of searching.
 9. A system for storingfilenames in and retrieving filenames from a memory device, comprising:means for determining a default text encoding based on a text encodingused to display text on a monitor coupled to said memory device; meansfor converting a filename from said default text encoding to Unicodetext encoding; means for storing the converted filename in said memorydevice; means for receiving a retrieval request; means for convertingsaid retrieval request from said default text encoding to Unicode textencoding; means for searching said memory device for a filename whichmatches the converted retrieval request; means for retrieving the filehaving the matching filename from said memory device; and means formodifying an encoding bitmap to indicate the use of said default textencoding.
 10. The system of claim 9 wherein said means for converting afilename is unsuccessful, and including further means for nextconverting said filename to Unicode text encoding based on Roman textencoding.
 11. The system of claim 10 wherein said means for convertingsaid retrieval request is unsuccessful and further including: means fordetermining if Roman text encoding is a default text encoding in saidmemory device: and if Roman text encoding is not a default textencoding, then means for using Roman text encoding to convert saidretrieval request to Unicode text encoding.
 12. The system of claim 11wherein said means for searching is unsuccessful and further including:means for retrieving a list of previously-used text encodings from saidencoding bitmap; means for using text encodings from said list toconvert said retrieval request to Unicode text encoding; and means forrepeating said means for searching.
 13. The system of claim 12 whereinsaid means for searching is unsuccessful and further including: meansfor converting said retrieval request to Unicode text encoding based onany text encodings installed in said memory device; and means forrepeating said step of searching.
 14. The system of claim 12 whereinsaid means for converting said retrieval request is unsuccessful andfurther including means for converting said retrieval request to Unicodetext encoding based on any text encodings installed in said memorydevice.
 15. The system of claim 10 wherein said means for searching isunsuccessful and further including: means for determining if Roman textencoding is a default text encoding in said memory device; means for ifRoman text encoding is not a default text encoding, then using Romantext encoding to convert said retrieval request to Unicode textencoding; and means for repeating said step of searching.
 16. The systemof claim 9 wherein said means for searching is unsuccessful and furtherincluding: means for converting said filename to Unicode text encodingbased on Roman text encoding; and means for repeating said step ofsearching.
 17. A computer-readable medium for storing instructions forcausing a computer to perform the steps of: determining a default textencoding based on a text encoding used to display text on a monitorcoupled to a memory device; converting a filename from said default textencoding to Unicode text encoding; storing the converted filename insaid memory device; receiving a retrieval request; converting saidretrieval request from said default text encoding to Unicode textencoding: searching said memory device for a filename which matches theconverted retrieval request; retrieving the file having the matchingfilename from said memory device; and modifying an encoding bitmap toindicate the use of said default text encoding.
 18. Thecomputer-readable medium of claim 17 wherein said step of converting afilename is unsuccessful, and including the further step of nextconverting said filename to Unicode text encoding based on Roman textencoding.
 19. The computer-readable medium of claim 18 wherein said stepof converting said retrieval request is unsuccessful and including thefurther steps of: determining if Roman text encoding is a default textencoding in said memory device; and if Roman text encoding is not adefault text encoding, then using Roman text encoding to convert saidretrieval request to Unicode text encoding.
 20. The computer-readablemedium of claim 19 wherein said step of searching is unsuccessful andincluding the further steps of: retrieving a list of previously-usedtext encodings from said encoding bitmap; using text encodings from saidlist to convert said retrieval request to Unicode text encoding; andrepeating said step of searching.
 21. The computer-readable medium ofclaim 20 wherein said step of searching is unsuccessful and includingthe further steps of: converting said retrieval request to Unicode textencoding based on any text encodings installed in said memory device;and repeating said step of searching.
 22. The computer-readable mediumof claim 20 wherein said step of converting said retrieval request isunsuccessful and including the further step of converting said retrievalrequest to Unicode text encoding based on any text encodings installedin said memory device.
 23. The computer-readable medium of claim 18wherein said step of searching is unsuccessful and including the furthersteps of: determining if Roman text encoding is a default text encodingin said memory device; if Roman text encoding is not a default textencoding, then using Roman text encoding to convert said retrievalrequest to Unicode text encoding; and repeating said step of searching.24. The computer-readable medium of claim 17 wherein said searching stepis unsuccessful and including the further steps of: converting saidfilename to Unicode text encoding based on Roman text encoding; andrepeating said step of searching.